Word Alignment-Based Reordering of Source Chunks in PB-SMT

نویسندگان

Santanu Pal

Sudip Kumar Naskar

Sivaji Bandyopadhyay

چکیده

Reordering poses a big challenge in statistical machine translation between distant language pairs. The paper presents how reordering between distant language pairs can be handled efficiently in phrase-based statistical machine translation. The problem of reordering between distant languages has been approached with prior reordering of the source text at chunk level to simulate the target language ordering. Prior reordering of the source chunks is performed in the present work by following the target word order suggested by word alignment. The testset is reordered using monolingual MT trained on source and reordered source. This approach of prior reordering of the source chunks was compared with pre-ordering of source words based on word alignments and the traditional approach of prior source reordering based on language-pair specific reordering rules. The effects of these reordering approaches were studied on an English–Bengali translation task, a language pair with different word order. From the experimental results it was found that word alignment based reordering of the source chunks is more effective than the other reordering approaches, and it produces statistically significant improvements over the baseline system on BLEU. On manual inspection we found significant improvements in terms of word alignments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping Method for Chunk Alignment in Phrase Based SMT

The processing of parallel corpus plays very crucial role for improving the overall performance in Phrase Based Statistical Machine Translation systems (PBSMT). In this paper the automatic alignments of different kind of chunks have been studied that boosts up the word alignment as well as the machine translation quality. Single-tokenization of Noun-noun MWEs, phrasal preposition (source side o...

متن کامل

A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation

This paper proposes a hybrid word alignment model for Phrase-Based Statistical Machine translation (PB-SMT). The proposed hybrid alignment model provides most informative alignment links which are offered by both unsupervised and semi-supervised word alignment models. Two unsupervised word alignment models (GIZA++ and Berkeley aligner) and a rule based aligner are combined together. The rule ba...

متن کامل

Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT

We describe two methods to improve SMT accuracy using shallow syntax information. First, we use chunks to refine the set of word alignments typically used as a starting point in SMT systems. Second, we extend an N -grambased SMT system with chunk tags to better account for long-distance reorderings. Experiments are reported on an Arabic-English task showing significant improvements. A human err...

متن کامل

A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

Syntactic reordering on the source-side is an effective way of handling word order differences. The { (DE) construction is a flexible and ubiquitous syntactic structure in Chinese which is a major source of error in translation quality. In this paper, we propose a new classifier model — discriminative latent variable model (DPLVM) — to classify the DE construction to improve the accuracy of the...

متن کامل

Alignment-based reordering for SMT

We present a method for improving word alignment quality for phrase-based SMT by reordering the source text according to the target word order suggested by an initial word alignment. The reordered text is used to create a second word alignment which can be an improvement of the first alignment, since the word order is more similar. The method requires no other pre-processing such as part-of-spe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Word Alignment-Based Reordering of Source Chunks in PB-SMT

نویسندگان

چکیده

منابع مشابه

Bootstrapping Method for Chunk Alignment in Phrase Based SMT

A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation

Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT

A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

Alignment-based reordering for SMT

عنوان ژورنال:

اشتراک گذاری